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Abstract 

We tackle the problem of estimating a regression function observed in an in- 
strumental regression framework. This model is an inverse problem with unknown 
operator. We provide a spectral cut-off estimation procedure which enables to de- 
rive oracle inequalities which warrants that our estimate, built without any prior 
knowledge, behaves as well as, up to log term, if the best model were known. 
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Introduction 

An economic relationship between a response variable Y and a vector of explanatory 
variables X is often represented by an equation 

Y = y,{X) + U, 

where (f is the parameter of interest which models the relationship while U is an error 
term. Contrary to usual statistical regression models, the error term is correlated with 
the explanatory variables X, hence E(f/|A) 7^ 0, preventing direct estimation of (p. To 
overcome the endogeneity of A, we assume that there exists an observed random variable 
W, called the instrument, which decorrelates the effects of the two variables A and Y 
in the sense that E(?7|iy) = 0. It is often the case in economics, where the practical 
construction of instrumental variables play an important part. For instance [CINQ 7] 
present practical situations where prices of goods and quantity in goods can be explained 
using an instrument. This situation is also encountered when dealing with simultaneous 
equations, error-in-variable models, treatment model with endogenous effects. It defines 
the so-called instrumental variable regression model which has received a growing interest 
among the last decade and turned to be a challenging issue in statistics. In particular, we 
refer to |HN91] . |NP 03J [Flo^3] for general references on the use of instrumental variables 
in economics while |HH05] . |DFR03] and |FJvB07j deal with the statistical estimation 
problem. 

More precisely, we aim at estimating a function observed in the following observation 
model 

'E(f/|A) ^0 



Y = ifiX) + U, 



E{U\W) =0 
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Hence, the model ([I]) can be rewritten as an inverse problem using the expectation con- 
ditional operator with respect to W, which will be denoted T, as follows : 



The function r is not known and only an observation f is available, leading to the inverse 
problem f = Tip + 6, where cp is defined as the solution of a noisy Fredholm equation 
of the first order which may generate an ill-posed inverse problem. The literature on 
inverse problems in statistics is large, but contrary to most of the problems tackled in the 
literature on inverse problems (see |EHN96j . [Mr?96j . [CGPT02], [CHR03], [LLOS] and 
|0'S86j for general references), the operator T is unknown either, which transforms the 
model into an inverse problem with unknown operator. Few results exist in this settings 
and only very recently new methods have arised. In particular |CH05j . |Mar06trMar08j . or 
|EK01] and |HR08] in a more general case, construct estimators which enable to estimate 
inverse problem with unobserved operators in an adaptive way, i.e getting optimal rates 
of convergence without prior knowledge of the regularity of the functional parameter of 
interest. 

In this work, we are facing an even more difficult situation since both r and the oper- 
ator T have to be estimated from the same sample. Some attention has been paid to this 
estimation issue, with different kinds of technics such as kernel based Tikhonov regular- 
ization |DFR03j or |HH05] . regularization in Hilbert scales |FJvB07j . finite dimensional 
sieve minimum distance estimator |NP03j . with different rates and different smoothness 
assumptions, providing sometimes minimax rates of convergence. But, to our knowledge, 
all the proposed estimators rely on prior knowledge on the regularity of the function (p 
expressed through an embedding condition into a smoothness space or an Hilbert scale, 
or a condition linking the regularity of (p to the regularity of the operator, namely a link 
condition or source condition (see jCROS] for general comments and insightful comments 
on such assumptions). 

Hence, in this paper, we provide under some conditions, an adaptive estimation pro- 
cedure of the function ip which converges, without prior regularity assumption, at the 
optimal rate of convergence, up to a logarithmic term. Moreover, we derive an oracle 
inequality which ensures optimality among the different choices of estimators. 

The article falls into the following parts. Section [T] is devoted to the mathematical 
presentation of the instrumental variable framework and the building of the estimator. 
Section [2] provides the asymptotic behaviour of this adaptive estimate as well as an oracle 
inequality, while technical Lemmas and proofs are gathered in Section [3l 

1 Inverse Problem for IV regression 

We observe an i.i.d sample {Yi, Xi, Wi) for i = 1, . . . ,n with unknown distribution f[Y, X, W). 
Define the following Hilbert spaces 



r := E{Y\W) = E{<p{X)\W) = T^. 



(2) 



'X — 



{h-.R'^^ R, \\hfx ■■= E(/i^(X)) < +00} 



w — 



{g-.W^^ R, Ml, := E{g\W)) < +00}, 
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with the corresponding scalar product < ., . >x and < .,. >w- Then the conditional 
expectation operator of X with respect to W is defined as an operator T 



g^Y.{g{X)\W). 



The model ([H) can be written, as discussed in |CR08] . as 

Yi = ^{Xi) + ¥.[ip{Xi)\w,] - ^[ip{Xi)\Wi\ + a, 

= ^[^{X,)\W,] + V, 

= T^{Wi) + y,, (3) 

where Vi = (p{Xi) -E[ip{Xi)\Wi] + Ui, is such that E(\/|iy) = 0. The parameter of interest 
is the unknown function if. Hence, the observation model turns to be an inverse problem 
with unknown operator T with a correlated noise V. Solving this issue amounts to deal 
with the estimation of the operator and then controlling the correlation with respect to 
the noise. 

The operator T is unknown and depends on the unknown distribution of the observed 
variables f(Y,x,z)- Estimation of an operator can be performed either by directly using 
an estimate of f(Y,x,z), or if exists, by estimating the spectral value decomposition of the 
operator. 

Assume that T is compact and admits a singular value decomposition (SVD) (Aj, (pj, 
which provides a natural basis adapted to the operator for representing the function 
see for instance |EHN96] . More precisely, let T* be the adjoint operator of T, then T*T 
is a compact operator on Lj^ with eigenvalues A|, j ^ 1 associated to the corresponding 

eigenfunctions (f)j, while tjjj are defined by ^pj = ||^|^- So we obtain 

T(j)j = Xjipj, T*iljj = Xj(f)j. 
We can write the following decompositions 

rH = E{Y\W = w)= Tip{w) = A, < ^,0, > V^,H, (4) 

and r{w) = '^^rjipj^w), (5) 

with Tj =< Y^ipj > that can be estimated by 

1 " 

i=l 

Hence the noisy observations are the fj's which will be used to estimate the regression 
function cp in an inverse problem framework. 

In a very general framework, full estimation of an operator is a hard task hence we 
restrict ourselves to the case where the SVD of the operator is partially known in the 
sense that the eigenvalues Aj's are unknown but the eigenvectors 0j's and ipj^s are known. 

Note that this assumption is often met for the special case of deconvolution. Consider 
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the case where the unknown function reduces to the identity. Hence model ([T]) reduces 
to the usual deconvolution model 

Y = X + U. 

Set fu the unknown density of the noise U and assume that fu G L2(R) is a 1 periodic 
function. Let also Tu be the convolution operator defined by Tug = g-k fu- In this special 
case, the spectral decomposition of the operator Tu is known, given by the unitary Fourier 
transform and the usual real trigonometric basis on [0, 1] are the eigenvectors . 

If the operator were known we could provide an estimator using the spectral decompo- 
sition of the function as follows. For a given decomposition level m, define the projection 
estimator (also called spectral cut-off |EHN96j ) 

m 

Since the A/s are unknown, we first build an estimator of the eigenvalues. For this, using 
the decomposition (jll), we obtain 

\j =< T(j)j,il)j >w 
= E[T<l>,iW)^P,iW)] 
= E[E[(l)^iX)\W]^jiW)] 

So the eigenvalue Xj can be estimated by 

1 " 

)^J = -Y.'f'AW^)<PJm. (7) 
It. . 

As studied in |CH05j . replacing directly the eigenvalues by their estimates in does not 
yield a consistent estimator, hence using their same strategy we define an upper bound 
for the resolution level 

M = inf Ifc ^ : |Afc| < ilogn 



n 



for to be chosen later. The parameter A^ provides an upper bound for M in order to 
ensure that M is not too large. The main idea behind this definition is that when the 
estimates of the eigenvalues are too small with respect to the observation noise, trying 
to still provide an estimation of the inverse A^^ only amplificates the estimation error. 
To avoid this trouble, we truncate the sequence of the estimated eigenvalues when their 
estimate is too small, i.e smaller than the noise level. We point out that this parameter M 
is a random variable which we will have to control. More precisely, define two deterministic 
lower and upper bounds Mo, Mi as 



Mo = inf <; A; : |Afc| ^ ^log^n J> - 1, (9) 



and 

Ml = inf <( A; : |Afe| ^ ^log^/^n }> , (10) 



n 
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we will show in Section [3], that with high probability Mq ^ M < Mi. 

Now, thresholding the spectral decomposition in leads to the following estimator 

m 

^m = Yl -r-h^M(t)j- (11) 

The asymptotic behaviour of this estimate depends on the choice of m. In the next 
section, we provide an optimal procedure to select the parameter m that gives rise to an 
adaptive estimator ^p* and an oracle inequality. 

2 Main result 

Consider the following assumptions on both the data Fj, i = 1, . . . , n and the eigenfunc- 
tions (pk and ipk for k ^ 1. 

Bounded SVD functions: There exists a finite constant Ci such that 

Vj^l, ||0,|U<C7i, \\^^\\^<Ci (12) 

Exponential Moment conditions: The observation Y satisfy to the following moment 
condition. There exists some positive numbers v ^ £(1^^) and c such that 

u\ 

Vj ^ 1, VA; ^ 2, E(Yh < -vd'-^. (13) 

•' 2 

These two conditions are required in order to obtain concentration bounds using first 
Hoeffding type inequality, then Bernstein inequality, see for instance ^vdGOO ] . Requiring 
bounded SVD functions may be seen as a restrictive condition. Yet it is met when 
the eigenvectors are trigonometric functions. However, this condition can be also be 
turned into a moment condition if we replace the concentration bound by a Bernstein type 
inequality. Note also that the moment conditions on Y amounts to require a bounded 
regression function and equivalent moment conditions on the errors Uj. 

IP: Degree of ill-posedness We assume that there exists t, called the degree of ill- 
posedness of the operator which controls the decay of the eigenvalues of the operator 
T. More precisely, there are constants A^, \u such that 

AlA;"* ^ Afc ^ \uk~\ Vfc ^ 1 (14) 

In this paper, we only consider the case of mildly ill-posed inverse problems, i.e when 
the eigenvalues decay at a polynomial rate. This assumption, also required in |CH05j . 
is needed when comparing the residual error of the estimator with the risk in order to 
obtain the oracle inequahty. 

Enough ill-posedness : Let = NaiiYipjiW)). We assume that there exist two 
positive constants a\ and afj such that 

Vj^l, ai^aj^a^. (15) 

Note that Condition ( fT3l) implies the upper bound of Condition ( fTSj) . The lower bound 
is similar to the variance condition in Assumption 3.1 in |CR08j . We we also point out 
that this condition is not needed when building an estimator for the regression function. 
However it turns necessary when obtaining the lower bound to get a minimax result, or 
when obtaining an oracle inequality. 
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2.1 Oracle inequality 

First, let Ro{Tn, ip) be the quadratic estimation risk for the naive estimator ([U]), defined 
by 

^ m 

k>m k=l 

The best model would be obtained by choosing a minimizer of this quantity, namely 

mo = argmin i^ol'^) V^)- (16) 

This risk depends on the unknown function ip hence mo is the oracle. We aim at con- 
structing an estimator of Ro{fn, p) which, by minimization, could give rise to a convenient 
choice for m, i.e as close as possible to mo- The first step would be to replace p)k by their 
estimates A^^f^ and take for estimator of cr^, a^, defined by 



1 " 

= -V(y,^fc(i^,) 



rk) 
1=1 

This would lead us to consider the empirical risk for any m ^ M, the cut-off which 
warrants a good behaviour for the Aj's 

m m 

Uo{m, r. A) = - 5^ X.'rl + ^ ^ Xfal Vm G N, 

k=l k=l 

for a well chosen constant c. The corresponding random oracle within the range of models 
which are considered would be 

mi = arg min -Ro(m, p>). (17) 

Unfortunately, the correlation between the errors Vi and the observations Yi prevents an 
estimator defined as a minimizer of f/o(m, r. A) to achieve the quadratic risk RQ{m,p>). 
Indeed, we have to use a stronger penalty, leading to an extra error in the estimation that 
shall be discussed later in the paper. More precisely, c in the penalty is not a constant 
anymore but is allowed to depend on the number of observations n. 

Hence, now define R{m, p) the penalized estimation risk as 

1 2 'Tt 

i?(m, p) = Y^pI + ^ Vm G N. (18) 

k>m k=l 

The best choice for m would be a minimizer of this quantity, which yet depends on the 
unknown regression function p. Hence, to mimic this risk, define the following empirical 
criterion 

m 1 2 ™ 

f/(m, r. A) = - 5^ Xfrl + ^ ^ K'^l Vm G N. (19) 

k=l k=l 
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Then, the best estimator is selected by minimizing this quantity as follows 



m* := arg min U{m, r, A), (20) 



Finally, the corresponding adaptive estimator yj* is defined as: 

m* 

v'' = Y.^K'f^'i'k- (21) 

k=l 

The performances of (p* are presented in the following theorem. 

Theorem 2.1. Let ip* the projection estimator defined in [21\} . Then, there exists Bq, Bi, B2 
and T positive constants independent of n such that: 

r 1 B 

E\\if*-ipf ^ Bo\og\n). inf R(m,<f) + — (log(n).||^f 

I m in 

+n + log' {n).T (if), 

where ^ -62(1 + ||¥'|P)exp |— log^^^n}, tuq denotes the oracle bandwidth and 



fc=min(Mo,mo) 



2 I \ -2 2 

+ -K ^k 



(22) 



with the convention = if a = b. 

We obtain a non asymptotic inequality which guarantees that the estimator achieves 
the optimal bound, up to a logarithmic factor, among all the estimators that could be 
constructed. We point out that we loss a log^(n) factor when compared with the bound 
obtained in |CH05j . The explanation of this loss comes from the fact that the error on 
the operator is not deterministic nor even due to a independent noisy observation of the 
eigenvalues. Here, the A^'s have to be estimated using the available data by A^. In the 
econometric model, both the operator and the regression function are estimated on the 
same sample, which leads to high correlation effects that are made explicit in Model ([3]), 
hampering the rate of convergence of the corresponding estimator. 

An oracle inequality only provides some information on the asymptotic behaviour of 
the estimator if the remainder term T{(f) is of smaller order than the risk of the oracle. 
This remainder term models the error made when truncating the eigenvalues, i.e the error 
when selecting a model close to the random oracle mi ^ M and not the true oracle rriQ. 
In the next section, we prove that, under some assumptions, this extra term is smaller 
than the risk of the estimator. 



2.2 Rate of convergence 

To get a rate of convergence for the estimator, we need to specify the regularity of the 
unknown function ip and compare it with the degree of ill-posedness of the operator T, 
following the usual conditions in the statistical literature on inverse problems, see for 
example |MR,96j or [CT02], |BHMR,n7j for some examples. 

Regularity Condition Assume that the function ip is such that there exists s and a 
constant C such that 

E < ^ (23) 
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This Assumption corresponds to functions whose regularity is governed by the smoothness 
index s. This parameter is unknown and yet governs the rate of convergence. In the special 
cases where the eigenf unctions are the Fourier basis, this set corresponds to Sobolev 
classes. We prove that our estimator achieves the optimal rate of convergence without 
prior assumption on s. 

Corollary 2.2. Let (p* he the model selection estimator defined in (121 p . Then, under the 
Sobolev embedding assumption (1^ . we get the following rate of convergence 

to) )• 

with 7 = 2 + 2s + 2t. 

We point out that ip* is constructed without prior knowledge of the unknown regularity 
s of if, yet achieving the optimal rate of convergence, up to some logarithmic terms. In 
this sense, our estimator is said to be asymptotically adaptive. 

Remark 2.3. In an equivalent way, we could have imposed a supersmooth assumption, 
on the function yj, i.e assuming that for given 7, t and constant C, 

00 

^exp(27A;*)¥.^<a 
fc=i 

Following the proof of Corollary 12. 2^ we obtain that Mq > uiq ~ (a27log?7,)^/* with 2a7 > 
1, leading to the optimal recovery rate for supersmooth functions in inverse problems. 

In conclusion, this work shows that provided the eigenvectors are known, for smooth 
functions ip, estimating the eigenvalues and using a threshold suffices to get a good es- 
timator of the regression function in the instrumental variable framework. The price to 
pay for not knowing the operator is only an extra log^ n with respect to usual inverse 
problems and is only due to the correlation induced by the Vi's. One could object that 
when dealing with unknown operators, the knowledge of the eigenvectors is a huge hint 
and some papers have considered the case of completely unknown operators, using func- 
tional approach, see for instance |DFR03] . |FJvB07j . but their estimate clearly rely on 
smoothness assumptions for the regression. Hence the two approaches are complemen- 
tary since we provide more refined adaptive result with the sake of stronger assumptions. 
Nevertheless, using similar techniques to develop a fully adaptive estimation procedure 
would be the last step towards a full understanding of the IV regression model. 



3 Technical lemmas 

First of all, we point out that, throughout all the paper, C denotes some generic constant 
that may vary from line to line. 

Recall that we have introduced 

M = milk^N : lAfcl ^ ^lognl - 1, 
I J 

The term provides a deterministic upper bound for M and ensures that M is not 
too large. Typically, choose N = n^. The following lemma provides a control of the 
bandwidth M by Mq and Mi respectively defined in ^ and (|TU|) . 
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Lemma 3.1. Set M = {Mq ^ M < Mi}. Then, for all n^l, 

P{M^) ^ CMoe" 



where C and r denote positive constants independent of n. 
PROOF. It is easy to see that: 

P{M'') = P ({M < Mo} U {M ^ Ml}) ^ P(M < Mo) + P(M ^ Mi) 

Using P and (fTU|), 



P(M ^ Ml 

Thanks to the definition of Xm^ 
P{M ^ Ml) ^ P 



^ P IAmJ ^ ^logn . 





Aa/i - 




Aa/i - 




n ^ 

i=l 



^ log n 

'n 



^ ^ logn - IAa/, 



72 



where bn = n logn — IAati | for all n G N. Let k E N and x G [0, 1] be fixed. Assumption 
(fT2l) and Hoeffding inequality yields 



^(|Afe-Afe|>x) ^ 2exp 
= 2 exp 



2Er=i Var(0A^,(X,)^Mi(W^.)) + 2nCx/3 



2Var(0A,,(X)V'A/,(Vr)) + 2Ca;/3 
Using again the assumption (fT^ on the bases (0fc)fcgN and (^/'fe)fc6N5 

Var(0M,(X)^A./,(W^)) ^ Ci^E[02^^(X)^2^^(l^)] ^ 1. 

Hence, 

P(|Afe - A,| > x) ^ 2exp (-C^) , Vx G [0, 1], 



(24) 



with C depending on Ci. 

Using (fTOl) . 1 > 6„ > for all n E N. Therefore, using (!24l) with x = 6^, we obtain: 



P(M ^ Ml) < 2exp |-^| ^ 2exp|-i(logn-log^/^n)2| 



^ Cexp{-log^+^n} 



where C and r denote positive constants independent of n. 
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The bound of P{M < Mq) follows the same lines: 

logn 



/Mo f , >. \ Mo 

P(M<M„)^p(u{|A,Kl2|S}j « E 



P I IA,| * 



Mo y , 



Let j e {1, . . . , Mo} be fixed. 



■' 'n 



n 

i=l 



where = n ^/^logra — \j for all n E N. Thanks to (Q, bn < for all n E N. Using 
Hoeffding inequality and Assumption ffT^ : 

^ fA, ^ exp I ^1 ^ Cexp {- log^^^n} , 

for some C,t > 0. This concludes the proof of Lemma [3. 1[ 



Lemma 3.2. Let B the event defined by: 

M 



^=r\ {l^fc Vfel ^ ^} , where f^k = Xk - VA; E N*. 
Then, 

for some r > and positive constant C . 
PROOF. Using simple algebra and Lemma [3.11 

P{B^) = P{B^nM) + P{B^nM''), 
^ P{B^nM) + P{M^), 
^ P{B'nM) + CMoe-^°^'^^''. 

Then, 

P{B^ nM)=P Q {|A, V.I > ^} ^ -^j ^ ^ U {l^^^V.I ^ ^} j 
Let k E {1, . . . , Ml — 1} be fixed. Remark that: 

P {\KW\ > i) = P (l..l > ^) < P (|A. - A.I > 5^ W'^") . 

Then, using ((21]) with x = 2n~^/^ log^^^ n: 



□ 



P ( |A, - Afel ^ -i-log=^/4n ) ^ Ce~'°^"^^^, (25) 



2v^ 

for some r > and a positive constant C. This concludes the proof of Lemma 13. 2[ 
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□ 

The following lemma provides some tools for the control of the ratio A^^A^ on the 
event B. 

Lemma 3.3. For all k ^ M, we have: 



^ - 1 ) 1b ^ -A,^(Afc - XkYlB- 
At 



Moreover, we have the following expansion: 



Afc 



1 — A^"'^(Afc — Afc) + A^^(Aa; — XkY^ki 



where Vk is uniformly bounded on the event B. 
PROOF. Let ^ M be fixed. Then 



Ai 



Ai 



^ - 1 1. = 1 



Aa 



Afc + lik 



Ifi ^ 2A^^(Afe — Afc)^lz 



where the /i^ are defined in Lemma 13. 2[ The end of the proof is based on a Taylor 
expansion of the ratio A^ ^A^ = (1 + A^^/i^)^^. The variable Vk depends on A^^/i^ and can 
be easily bounded on the event B. 



□ 



Lemma 3.4. Let m a random variable measurable with respect to {Yi, Xi,Wi)i=i,,,n such 
that m ^ M. Then, for all K > 1 and 7 > 0, 



H) E 



j:k'irk-rkr 



k=l 



n 



.k=l 



-2^2 
k 



CNne- 



Hi) E 



.k=l 



k [rk - TkjTk 



n 



.k=l 



-2^2 

k 



+7-^i?(mo, + 7E 



k>m 



where C > is a positive constant independent of n, ttlq denotes the oracle bandwidth and 
N has been introduced in 

PROOF. Let Q > a positive term which will be chosen later. With simple algebra: 



E 



T.^K'ifk-rkr 



k=l 



EY^k'ih-rkri 



k=l 



< 5e 



n 



E^ 

.k=l 



-2^2 
k ^k 



(rk-rkr<—^ 
2 



k=l 



^Y.^Xf{fk-rkn 



k=l 



(26) 



11 



In the sequel, we are interested in the behavior of the second term in the right hand side 

-2 , 



of (|26l) . Since A^^ ^ nlog n for all k ^ M and m ^ N, we obtain: 



k=l 



n 

^2 ^ ^ ^ 



TV 



■} " log 



n 



k=l 



Let A; G {1, . . . , A^} be fixed. It follows from integration by part that: 



E(ffc-rfc)2l 



+ 00 



«4 



P {{fk - Tkf > x) dx. 



Then, 



Assumption ffTSl) together with Bernstein inequality entails that: 



P{\rk-rk\ ^ v^) = P 
^ exp 



1 " 

i=l 



n^x 



exp 



2 Ell y^r{YMW^)) + Cn^ j ' 
nx 



2al + C^i 



Now remark that: 



2al = C-Jx ^ X = D, with D 



2a^ 
~C 



2\ 2 



We obtain: 



D 



nx 



dx + 



dx + 



+ 00 



D 



, exp 



nx 



+00 



exp 



*"fe 



n 



+00 



+ 



exp 



nx 



nx 



D 



exp |— Cni/x} dx 



2al + CVi 
dx, 



dx. 



n 
n 



4(7? n Cn 



-Cn 



Hence, we have 



Eih-Vkfl 



for some C > 0. Using (|28|) and (127]), 



EY.^K\rk-run 



k=l 



+ e 



-Cn 



(27) 
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From ( l26l) . we eventually obtain: 



E 



.k=l 



n 



.k=l 



-2^2 
k ^fc 



+ 



CNn 



log^ n 



-0/4 _^ g-Cn_ 



Choose Q = log (n) in order to conclude the proof of (i). 



Now, consider the bound of (u). Let mo the oracle bandwidth defined in ( fT6l) . With 
the convention = — if 6 < a, 



k=l 



k=mo 



^ E 



k=mo 



k^{rk - TkYk 



(29) 



fc=l 



Indeed, E[ffc] = for all G N. Then remark that: 

I l{A:s;m} — l{A;s;mo} I = | (l{A:^?fi} + l{A:^mo}) (l{fc^m} ^ l{A:^mo}) | j 
~ (l{A;^m} + l{fc^mo}) |l{fc>m} ~ l{A;>mo}| ' 
^ l{/c>m}l{fc^mo} + l{A:>»no}l{fc^m}- (30) 

Using the Cauchy-Schwartz inequality and using that for all a, h and 1 > 7 > 0, 2ah ^ 
^0? + 7"^6^: 



k \J'k - TkFk 



k=l 



< J^Y. ^k'rl E J2 K\h - Tk? +J^Y. ^k'rlk J2 - Tkf 



k>rn 



k^mo 



k>mo 



k^m 
mo 



(. k>m fc>mo 

We eventually obtain: 



k^h-rkf + 'E.Y.K^h-rkf 

k=l k=l 



m I m 

\\rk - rk)rk ^ 7"'^("^o, f) + iEY, fl + l'' l^Y. - r^)' 

k=l k>fh I k=l 



We conclude the proof using a similar to (i) string of inequalities. In particular, using 
Assumption ([H]), we obtain the bound A^^ ^ CiV^* for all k ^ M. 



□ 



Lemma 3.5. Let m a random variable measurable with respect to {Yi, Xi,Wi)i=i,,,n such 
that rh ^ M . Then, for all 7 G (0, 1), 



-2\ 2 
k Fk 



rt ^ 



7 + 7-1 lQg3/2 



E 



k=l 



n 



.k=l 



2^2 



^ 1 / \og\n).yr 

n\ 7 



2t 



+ log^(n).i?(mo, ip) + 
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PROOF. The term in the left hand side can be rewritten as: 



fe=i 



fc=i \K 



k=i \\ 



Using Lemma we obtain: 



eE(V-a 



-2\ 2 
k Fk 



E 



k=l 



,k=l 



E 



k=l 



W1 + W2 



where the /i^ are defined in Lemma [3.21 First consider the bound of W2- Using fl24l) with 
X = n~^l'^ logra, we obtain: 



E 



^k\ ^^k^k 



.k=l 

n 



C CE 



E2 \ -2 2 



fc=l 



i;=l 



+ entire 



2 - logi+^ ?i 



(31) 



where C denotes a positive constant independent of n. Thanks to our assumptions on the 
sequence {\k)k&h fo^' all 7 > 



Wo C 



2^-logi+-n 



7 



n 



, V^lfEsup A;.^ + C||(y9|pe 

^ k^fh 

C f\og\n).\\^f^ 



k=l ^ 



7 



(32) 



where for the last inequality, we have used (fT^ and the bound: 

supA-^iE^" + ^^" 

fe=l 

with X = 7^^ log^(n).||(/)|p. More details on this bound can be found in |CGPT02] . 

We are now interested in the bound of Wi. Using (l30l) and a similar to ( l29i) string of 
inequalities, we obtain: 



w, = eEv^'a^ 

k=l 
+00 

^ EE l{fe>m}l{fc^mo}<y^fc|-^fe Vfcl + Ey^ l{fc>mo}l{fc^m}V?fc|Afc Vfcl, 



+00 



k=l 



k=l 



^ , P E /E E ^kX^>^ - + E v'I/e E ^^"'(^'^ - 

k>m y fc^mo V fc>mo V fcsCm 



Hence, for all 7 > 0, 



mo 



IVi ^ 7 <; E E + E + E E A^'(A. - A.)^ + E E A, ^(A, - A, 

k>fh k>mo J (. k=l k=l 



14 



Using (|2^ once again with x = n 

-1/2 w3/4 

n, we obtain for all 7 > 0: 
7 log n 



I. k>fh k>mo ) 

This concludes the proof of Lemma 3.5. 



n 



k=l 



k=l 



□ 



Lemma 3.6. Let m a random variable measurable with respect to {Yi, Xi,Wi)i=i,,,n such 
that m ^ M. Then, 



E 



n 



.fc=i 



k (^fe - ^fcJ 



n 



3/2 



.fc=i 



2^2 



1 



E 



n 



E^ 

.A:=l 



-2( 2 _ -2\ 
A: v'k ^k) 



/or some C > independent of n. 
PROOF. First remark that, for all A; ^ 1, 



P^r'^ ^2 
(^k - (^k 



n ^-^ 

i=l i=l 
1 " 

1=1 



i=l 



Hence, we obtain 



n 



.k=l 

where for all G N: 



n 



E^ 

k=l 



k^Pk 



+ iE 

n 



E^ 

.fc=i 



-2/ 2 -2\ 
A; V A; ^fcJ 



(33) 



1 " 

i=l 



We are interested in the first term in the right hand side of 
constant which will be chosen later: 



Let 5 > a positive 



-E 



n 



.k=l 



= -E 

n 

n 



E-^fcVfcl{pfe45} 

,fc=i 

fh 

E^. 



1 



-E 



.k=l 



n 



E\%l{pfc><5} 

.fc=i 

E-^fcVfci{pfe>5} 



,A;=1 



Since m ^ M, from integration by part, 



E 



n 



^K'^Pk'^iPk 



>5} 



.fc=i 



log 



{Pfe>5} 



1 ^ /•+00 

r^E / ^(P'^ ^ 
log ^fit-^^ 
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Let k eN and x ^ 5 he fixed. Using Bernstein inequality: 



^ exp 
^ exp 
^ exp 



n^x"^ 



2Er=iVar(F,V^(W^,)) + Cxn/3 

2 2 

n X 



2nDQ + Dinx j ' 
nx"^ 



2Do + Dix 



with the hypotheses fll3l) and (fT^ on y and {ipk)k- The constants ^.nd Di are positive 
and independent of n. Therefore, for all k ^ N, 



+ 00 



P{Pk ^ x)dx 

2Do/Di 



exp 



nx 



2Do + Dix 



dx 



exp 



2Do/I5i 



nx 



2Do + Dix 



p2Do/Di r+oo 

^ / exp{— Cnx^}(ix + / exp{— nxjrf, 

J<5 J2D0/D1 

r+00 

^ / exp{— Cn(5x}(ix H — e 
75 ^ 



I2D0/D1 

-Cn 



^ -^expi-n^n + n-^e-^", 
no 

for some C > 0. Choosing 5 = n~^/'^ \ogn and using Assumption fllSp . we obtain: 



n 



.fc=i 



fcVfc 



.fc=i 



2^2 



We use fl33|) in order to conclude the proof. 



□ 



4 Proofs 

Proof of Theorem 1. The proof of our main result can be decomposed in four steps. 
In a first time, we prove that the quadratic risk of y?* is close, up to some residual terms, 
to E^(m*, v?) where 

1 2 

V) = Y.^I + ^Y. ^^'^"^l ^ (34) 

k>m k=l 

This result is uniform in m and justifies our choice of R{m, (f) as a criterion for the 
bandwidth selection. 
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In a second time, we show that ER{m*,(p) and EU{m*,r,ip) are in some sense com- 
parable. Then, according to the definition of m* in ( l20l) . 



U{m* ,r,ip) ^ f/(m, r, Vm ^ M. 

We will conclude the proof by proving that for all m ^ M, Ef/(m, r, ip) = E||(^m — V^H^, 
up to a log term and some residual terms. 

In order to begin the proof, remark that: 



k - "Pk) 



k=l k>m* 

This is the usual bias-variance decomposition. Then 



k=l 



^k) 



k=l 



k=l 

m* m* 

^ 2E ^ Xfih - + 2E ^(A^Vfc - = Ti + T2. 



k=l 



k=l 



Concerning T2, we use the following approach. For all 7 > 0, using Lemma 3.3 and the 
bounds (ED and (1321): 



k=l 



k=i ^^fe 

2 m* / , s 2 



k=l / k=l J 



/ 1 ) (^fclfic, 



< He 



E\ -2 2 2 



1], 



fc=l ^ 



II log [n] 

7 



2t 



(35) 



where fik = Xk — Xk for all k eN. The term Ti is bounded using Lemma [331 with rn = m* 
and K = 2. Hence, for all 7 > 0, 



E\\ip*-^\\' ^ (l + 7)Ei^(m^(/?) 



n 



7 



2t 



(36) 



where R{m*, (f) is introduced in ( l34|) . This concludes the first step of our proof. 
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Now, our aim is to write Ei?(m*, ip) in terms of FiU (m*, r, ip): 
EC/(m*,r, 



E 



= E 



m* ■, 2 ™* 



jfc=i 

m* 



k 



k=l 



1 2 ™ 
x-2 2 , log ^ C-2 2 

fe=l 



'k ^fe 



E 



.k=l 



log^ n 
n 



E 



fc=i 



= E 



1 2 ™ 

E2 log n ^ 2 

.A;>m* k=l 



Ikir-E 



.k=l 



log^n 
n 



E 



X]^fe^(^^fc - ^k) 

.k=l 



Hence, 



.k=l 



log^ n_ 
H — - — E 
n 



.k=l 



Remark that: 
E 



E 



= E 



.k=l 
m* 



E 



j:ck' - Kyi 



.k=l 



Yl k^ii^k - rkf + 2{fk - rk)rk} 



.k=l 

Using simple algebra: 



+ E 



-2\ 2 
k Fk 



.k=l 



k=l 

m* m* 

= ^Y.K\h - ^k)rk + E J](A^2 _ ^-2^^^^ _ ^^y^^ 
fc=l fc=l 

m* m* 

= ^J2K\rk - rk)rk + E^(A,-^ - X^^kiK' + X^'Kh - n), 

k=l k=l 

m* m* m* 



k=l 



k=l 



k=l 
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Hence 
E 



.k=l 



^ CE 



+E 



.k=l 
m* 



2E 



k=l 



2\ 2 
k Pk 



k=l 



k=i ^-^fc 



1 V't 



Using Lemmata 3.4, 13.51 and (|35ll . we obtain, for all 1 > 7 > and i^' > 1: 



E 



,A;=1 



^ (27-Mog^n + C7"Mog^/"n + 7) .^E 



.fc=i 



(38) 



+7-^i?(mo, y^) + 7E 



_k>m* 



+ Q + C7-^iV-+ie-'-"" + ^ f I^SMM 

7 



n 



2t 



Remark that this result can be obtained for all m measurable with respect to the sample 
[Xi, Yi, Wi)i=i,„n- Then, from (|37l) and Lemma 3.6, 



Ei?(m*, (p) 

^ E[/(m^ r,^) + yf+ (2^-' log^ n + C^^ \og^'^ n + C^^) - 

\ n^i'^ J n 



E 



.fc=i 



+7 ^i?(mo,V5) +7E 



,k>Tn* 



'1 Ar2i+l„-los^n _^ _^ :i 



n \ 7 



2t 



which can be rewritten: 

V 7 

with 



2i 



(39) 



p(7, K, n) = 27-^ log^-2 n + — - + log"^/^ n + 7. 

The third step of our proof can be easily derived from the definition of m* and leads to 
the following result: 



AT2t+l^- log^ n _^Q_^ 

n 



C [\og\n).\\^f 



7 



2t 



,(40) 



where mi is defined in ( IT71) and denotes the oracle in the family {1, . . . , M}. In order to 
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conclude the proof, we have to compute E?7(mi, r, + W^pW^- In a first time, remark that: 



E[/(mi,r, V9) + 



E 



E 



+E 



k=l 



mi 1 2 "11 

k=l 

nil 

k=l 
mi 



'2 ;i2 



2 2 



+ 



2 + 

n 



mi 



^2^2 



,fc=l 



2 -2^ 



fc=i 



n 



m 



k=l 



Hence, 



= E 



, 2 ™-i 
2 log \ -2 2 



.k>mi 



'k ""k 



k=l 



+ E 



k=l 



log^ n _ 
+— — E 

n 



mi 



.k=l 

ER(mi,^) + Fi + F2. 



The same bound as (!38i) occurs for Fi. By the same way, using Lemma 3.6: 



F, = i^E 

n 



Therefore, for all ^ 1, 



mi 



k=l 

mi 



k=l 



2^2 



^ mi 

-eVa 



"2/ 2 -2\ 
fc Vk ''^kJ 



fc=l 



EU{m^,r,ip) + yf ^ (^l + Clog^-2n + j Ei?(mi, yp) + i?(mo. 



n V 7 

Using ( l40l) and ( l4Tl) . we eventually obtain: 
(l-p(7,i^,iV))Ei?K,¥p) 
< (^1 + log^'2 ^ ^ ^-^^^^^ Ei?(mi, (^) + C-f~'ER{mo, ^) 



2t 



+ n. (41) 



2i 



-1 Ar2t+l„-log-^n 



< C\og\n).ER{mi,ip) + C-f-'ER{mo,v) + C-i-'N'^'+'e 



+ - 

n 



C flogHn] 



1 



2t 



< C\og^{n).R{mo,^) + \og^{n).T{^) + 



C flog' in). 



n 



7 



2t 
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for some positive constant C, where T{ip) is introduced in Theorem 1. With an appropriate 
choice of K, this leads to: 



E||V9* - Lfl 

^ Clog^(n).i?(mi,<^) + ^ r ^"^'^""!''^'' ) +n + log\n).T{^). 



□ 



Proof of Corollay 12.21 We start by recaUing the oracle inequality obtained for the 
estimator ip*. 

E\\ip*-ipf ^ Colog'(n). [inf/?(m,¥;)l + — (logH.llv^f)''' 

I m J n 

We have to bound the risk under the regularity condition and the extra term \og^ (n)T((p). 
Recall that the risk is given by 

1 2 "1 

k>m k=l 

Hence under ([2SD, we obtain both upper bounds for two constants Ci and C2 

k>m 



1 2 "i 1 2 

k=l 



An optimal choice is given by m = [{n/ logra) i+2s+2t]^ leading to the desired rate of con- 
vergence. 

Now consider the remainder term T{lp). Under Assumption [IP], Mq ^ [n^/^'^/ log^ n], 
but since mo = [n ws+2t ] get clearly that itlq ^ Mq, which entails that T{ip) = 0. 
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